Improving Open Information Extraction using Domain Knowledge
نویسندگان
چکیده
Open Information Extraction (OIE) aims to identify all the possible assertions within a sentence. Recent and thus the most efficient OIE-tools use the grammatical dependencies or the syntactic tree of the sentence to perform extraction. When they provide a wrong extraction it is mainly due to parsing errors. In this paper, we propose to handle these parsing errors before doing OIE itself. To achieve our goal we focus on multi-word expressions (MWE). They represent more than 45% of wrong extractions. We show how the MWE-problem can be handle in a given domain and how MWE-unbreakable property is a good filter for OIE.
منابع مشابه
Presenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملA New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model
Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...
متن کاملAssessing and Improving Domain Knowledge Representation in DBpedia
With the development of knowledge graphs and the billions of triples generated on the Linked Data cloud, it is paramount to ensure the quality of data. In this work, we focus on one of the central hubs of the Linked Data cloud, DBpedia. In particular, we assess the quality of DBpedia for domain knowledge representation. Our results show that DBpedia has still much room for improvement in this r...
متن کاملAn Open Architecture for Multi-Domain Information Extraction
This paper presents a multi-domain information extraction system. The overall architecture of the system is detailed. A set of machine learning tools helps the expert to explore the corpus and automatically derive knowledge from this corpus. Thus, the system allows the end-user to rapidly develop a local ontology giving an accurate image of the content of the text, so that the expert can elabor...
متن کاملBuilding a Knowledge base through Open Information Extraction techniques
The emergence of the Open-IE paradigm has given way to domain independent techniques for information extraction. Its capability of extraction new relations from text shows potential towards the task of building a knowledge base. However, its advantages also come with its disadvantages such as the lack of context expressed by the relations extracted by Open-IE systems. This research attempts to ...
متن کامل